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Abstract: 


Multiview light sheet fluorescence microscopy (LSFM) allows to image de¬ 
veloping organisms in 3D at unprecedented temporal resolution over long 
periods of time. The resulting massive amounts of raw image data requires 
extensive processing interactively via dedicated graphical user interface 
(GUI) applications. The consecutive processing steps can be easily au¬ 
tomated and the individual time points can be processed independently, 
which lends itself to trivial parallelization on a high performance computing 
(HPC) cluster. Here we introduce an automated workflow for processing 
large multiview, multi-channel, multi-illumination time-lapse LSFM data on 
a single workstation or in parallel on a HPC cluster. The pipeline relies on 
snakemake to resolve dependencies among consecutive processing steps 
and can be easily adapted to any cluster environment for processing LSFM 
data in a fraction of the time required to collect it. 


Availability: 

The code is distributed free and open source under the MIT license http:// 
opensource.org/licenses/MIT. The source code can be downloaded from 
github: https://github.com/mpicbg-scicomp/snakemake-workflows. Docu¬ 
mentation can be found here: http://fiji.se/Automated workflow for parallel 
Multi-view Reconstruction. 


Contact: 

schmied(a)mpi-cbg.de 


1 Introduction 

The duration and temporal resolution of 3D fluorescent imaging of living biological 
specimen is limited by the amount of laser light exposure the sample can survive. 
LSFM alleviates this by illuminating only the imaged plane thus reducing photo dam¬ 
age dramatically. Additionally LSFMs achieve fast acquisition rates due to sensitive 
wide-field detectors and in Selective Plane Illumination Microscopy (SPIM), sample 
rotation enables complete coverage of large, non-transparent specimen. Taken to¬ 
gether, LSFMs allow imaging of developing organisms in toto at single cell resolution 
with unprecedented temporal resolution over long periods of time (Huisken et al., 
2004; Keller et al., 2008). 
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This powerful technology produces massive, terabyte size datasets that need com¬ 
putationally expensive and time-consuming processing before analysis. Existing soft¬ 
ware solutions implemented in Fiji (Preibisch et a!., 2010, 2014; Preibisch, unpub¬ 
lished; Schmied et al., 2014) or in ZEISS ZEN black are performing chained pro¬ 
cessing steps on a single computer and require user inputs via a GUI. As the spatial 
and temporal resolution of the light sheet data increase, such approaches become 
inconvenient since processing can take days. 

In controlled experiments, SPIM image processing is robust enough to be auto¬ 
mated and key steps are independent from time point to time point. HPC is inherently 
designed for such time consuming and embarrassingly parallel tasks that require no 
user interaction. Therefore, we developed an automated workflow with minimum user 
interaction that is easily scalable to multiple datasets or time points on a cluster. In 
combination with the appropriate computing resources it enables for the first time pro¬ 
cessing of SPIM data that is faster than the total acquisition time required for collecting 
the raw images. 


2 Processing workflow 

The Fiji SPIM processing pipeline uses Flierarchical Data Format (HDF5) as data 
container for the originally generated TIFF or CZI files by custom made (Pitrone et al., 
2013) or commercial SPIM microscopes (Fig. 1A,B). Following format conversion, 
multiview registration aligns the different acquisition angles (views) within each time 
point (Fig. 1C), and subsequent time-lapse registration stabilizes the recording over 
time (Preibisch et al., 2010) (Fig. ID). Fusion combines the registered views of one 
time point into a single volume by averaging or multiview deconvolution (Preibisch 
eta!., 2010, 2014) (Fig. 1E,F). The result is a set of FIDF5 files containing registered 
and fused multiview SPIM data that can be examined locally or remotely using the 
BigDataViewer (Pietzsch etal., 2015). 

All steps are implemented as plugins (Preibisch etal., 2010, 2014; Preibisch, un¬ 
published; Pietzsch et al., 2015), in the open-source platform Fiji (Schindelin etal., 
2010).We use these plugins by executing them from the command line as Fiji bean- 
shell scripts (Suppl. Fig. 1). To overcome the legacy dependency of Fiji on the GUI we 
encapsulate it in a virtual framebuffer [xvfb) that simulates a monitor in the headless 
cluster environment (Suppl. Fig. 1). 

To map and dispatch the workflow logic to a single workstation or on a HPC cluster, 
we use the automated workflow engine snakemake (Koster and Rahmann, 2012). 
The workflow is defined using a Snakefile containing the name, input and output 
file names of each of the processing steps and python code calling the beanshell 
scripts (Suppl. Fig. 1). Upon invocation, the snakemake rule engine resolves the 
dependencies between individual processing steps based on the input files required 
and the output files produced during the workflow. It also creates the command that 
fits the input/output rule description and the template command as defined in the 
Snakefile. Most importantly, if single tasks on individual files are discovered to be 
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independent, they are invoked in parallel (Suppl. Fig. 2). Each instance of snakemake 
for one dataset is independent and thus the workflow can be applied simultaneously 
to multiple dataset. 

The required parameters for processing are collected by the user during GUI pro¬ 
cessing of an exemplary time point and entered into a .yam/configuration file (Suppl. 
List 1). The workflow is executed by passing the .yam/file to snakemake on the com¬ 
mand line (Suppl. Fig. 1). Importantly, from the user perspective the launching of the 
pipeline on a FIPC cluster and on a local workstation appears identical and require a 
single command (Suppl. List 2). If the parameters are chosen correctly and the local 
or FIPC resources are sufficient (Suppl. Table 1 and 2) no further action from the user 
is necessary. 

Snakemake supports multiple back ends to perform the command dispatch: local, 
cluster and Distributed Resource Management Appiication API {DRMAA){K6ster and 
Rahmann, 2012). The local back end creates a new sub-shell and calls the com- 
mand(s) required. The cluster back end is a general interface to FIPC batch systems 
based on string substitution. DRMAA specifies a system library that interfaces all 
common batch systems based on a generalized task model, thus multiple batch sys¬ 
tems are supported through one interface. 


3 Results 



1 TP local 

90 TPs local 

90 TPs cluster 

Resave to hdf5 

3 

262 

15 

Detection and registration 

2.5 

221 

15 

Average fusion 

7 

661 

47 

Deconvolution (GPU) 

21 

1874 

740 

Resave output 

3 

286 

7 

Total with average fusion 


23 h 56 min 

1 h 31 min 

Total with deconvolution (GPU) 


44 h 08 min 

13 h 10 min 


Table 1 : Processing time comparison. Time (minutes) for key processing steps that 
are parallelized on a cluster. Total processing time including non parallel processing 
steps on the example dataset using either average fusion or deconvolution. 

We compared the performance of the pipeline on a 175 GB, single channel SPIM 
recording of a Drosophila embryo consisting of 90 time points and 5 views, processed 
either on a single computer or on a FIPC cluster (Table 1). The processing using 
average fusion takes almost precisely one day on a single powerful computer. In 
contrast, using the full cluster resource the dataset can be processed in 1 h 31 min, 
which represents a 16-fold speedup in processing. Since the time-lapse covers 23 
hours of Drosophila embryonic development the processing becomes real time with 
respect to the acquisition. Using deconvolution on a cluster with only 4 GPUs (Suppl. 
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A 

.czi Dataset: 

spim.czi 
spim(1).czi 
spim(2).czi... 

.tif / ome-tiff Dataset: 

spim_TL1_Angle1 .tif 
spim_TL1 _Angle2.tif 
spim_TL1_Angle3.tif... 


B 

Define dataset 


Define hdf5 dataset 



Resave to hdf5 


hdf5_Data.xml 
hdf5_Example-00-00.h5 
hdf5_Example-01-OO.hS ... 



C Detection and registration 



Merge xml 


D Time lapse registration 


E Multiview Fusion 



.hdfS Dataset: output as .tif: 


hdf5_Data.xml TP1_ChO_IIIO_Ang1,2,3.tif 

hdf5_Example-00-00.h5 TP2_ChO_IIIO_Ang1,2,3.tif 
hdf5_Example-01 -OO.hS TP3_ChO_IIIO_Ang1,2,3.tif 
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Figure 1: Automated workflow for Multiview processing. Workflow for SPIM image 
processing (A-E) using parallelization (B, C and E). Shown on the right yz-slices in 
the BigDataViewer of a Drosophila embryo expressing histone H2Av-mRFPruby raw 
(A) registered (C) and deconvolved (E). Results of deconvolution with xy-, xz- and 
xz-slices through the fused volume of the same embryo (F). Scale bars represent 50 
[xm. 
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Table 1) still brings a more than 3-fold speed up (Table 1). A dataset of 2.2 TB in size 
with 715 time points (Schmied etal., 2014) would take an estimated week to process 
on a single computer. Using this method the processing is reduced to only 15 h with 
typical cluster workload from other users. 


4 Conclusion and Outlook 

The biologist's goal is to analyse, for instance, cellular behaviour using time-lapse 
SPIM recordings. The steps between data acquisition and analysis are of rather 
technical interest. Our pipeline leverages HPC to reduce the notoriously difficult and 
time-consuming SPIM data processing to a single autonomous command. Future 
improvements of the workflow will provide greater accessibility to novice users by 
using the UNICORE GUI framework (Almond and Snelling, 1999). Ultimately, we 
aim for a completely unsupervised automated processing similar to grid computing 
practiced in fields facing similar big data challenges such as particle physics and 
molecular simulation (Bird, 2011; Gesing etal., 2012) 
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Supplementary data 


Architecture of automated workflow for Multiview Reconstruction 


A .yam/file B Snakefile 


C FijLbeanshell.bsh 


D Output: 


hdf5_xmLfilename 

ntimepoints 

angles 

channels 


workflow 

input and output rules 
resolves dependencies 
dispatches jobs 
splits parallel jobs 


Execute Fiji plugins 


Results of processing 
./og files of plugins 
Cluster, err 
Cluster, out 



Input; 

TP1,TP2, TP3... 



1 .job 

2. job 

3. job 

4. job 

5. job 



Fiji processing via 
virtual frame buffer (xvfb) 


Suppl. Fig. 1 : Conceptual architecture for processing a multiview dataset.Time-lapse 
recording of Histone-YFP expression during Drosophila melanogaster embryogene- 
sis with 5 views. The parameters are determined prior to the automated processing 
and stored in a .yami configuration file (A). These parameters are passed to a Snake- 
file, which contains the logic of the workflow (B). Upon execution of snakemake and 
presence of the input files (e.g. images) snakemake dispatches the jobs which call 
Fiji beanshell scripts to carry out the processing using Fiji (C). The output generated 
by the workflow triggers the next batch of jobs once the input rules of the next step are 
fulfilled. Additionally, the processing writes log files and the cluster error and output 
files (D). 
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define xml czl 


..'Z 

f hdfS xml \- 


resave_hdf5 ' t resave hdf5 
flle_kj: 03 ' , fileJdT 02 

\ xml_bas€; dataset_one J \ xml_base; datasel_or>e 


resavehdfS resave_hdf5 

flle_id: 00 ' file_id; 04 

xml_base: dataset_one J xml_base: dataset_or>e 


resave_hdf5 \ 

file_id: 01 ' 

xmj_base; dataset_one I 




y 


regist ration 

_ J 

( \ 
regist rat ion 

_ J 

regist rat km 


regist rat ion 


( \ 
registration 

_ J 


timelapse 


external transform 


deconvolution 


^^deconvoiutio^ 


deconvolution ; deconvolution 


^^deconvolutto^ 


deflr>e_out put 


hdf5_xml_output 



Suppl. Fig. 2: Dependency graph of the snakemake workflow. Example of a directed 
acyclic graph (DAG) for processing a dataset with 5 time points. Snakemake resolves 
the file dependencies (arrows) between the different processing steps (boxes, each 
step with different colour). Jobs are dispatched when the input rule of the first pro¬ 
cessing step is fulfilled (A). The next batch is sent when all outputs of the processing 
step are created and the input rule of the next step is fulfilled (B-E). Independent tasks 
in the same processing step are dispatched in parallel, i.e. parallel processing of time 
points (B, E). 
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1. Processing switches 

2. Define dataset 

2.1. General Settings 

hdf5_xml_filename: '"dataset"', 
ntimepoints: 90, 
angles: "0,72,144,216,288", 
channels: "green", 
illumination: "0", 

2.2. Settings for .czi files 

first_czi: "name.czi", 

2.3. Settings for .tif datasets 

image_file_pattern: 'img_TL{{t}}_Angle{{a}}.tif', 
multiple_channels: "'NO (one channel)"', 

3. Detection and registration 

3.1 Channel settings 

reg_process_channel: "'All channels'", 
source_channel: "red", 
target_channel: "green", 
reg_lnterest_points_channel: '"beads'", 
type_of_detection: '"Difference-of-Gaussian"', 

3.2 Settings for Difference-of-Mean 

3.3 Settings for Difference-of-Gaussian 

sigma: '1.3', 

threshold_gaussian: '0.025', 

4. Time-lapse registration 

reference_timepoint: '45', 

5. Weighted-average fusion 

downsample: 'T, 
minimal_x: '274', 
minimaLy: '17', 
minimal_z: '-423', 
maximal_x: '1055', 
maximaLy: '1928', 
maximal_z: '480', 

6. Multiview deconvolution 

6.1. External transformation 

external_trafo_switch: "transform", 

matrixjransform: '"0.5, 0.0, 0.0, 0.0, 0.0, 0.5, 0.0, 0.0, 0.0, 0.0, 0.5, 0.0'", 

6.2. Deconvolution settings 

iterations: '15', 
minimal_x_deco: '137', 
minimal_y_deco: '-8', 
minimal_z_deco: '-21T, 
maximal_x_deco: '527', 
maximal_y_deco: '964', 
maximal_z_deco: '240', 

detections_to_extract_psf_for_channel: '"beads'", 

7. Resave output 

8. Software directories 

9. Fiji resource settings 

10. Advanced settings 

Suppl. List 1. Parameter List for processing. List of the key input parameters in 
the .yarn! V\\e for multiview, multi-channel multi-illumination side processing of time- 
lapse SPIM recordings. The parameters are sorted to match the logic of the 
processing workflow: define and resave to hdf5 (2) > detect and register (3) > time 
lapse registration (4) > fuse (5 -i- 6) and resave output (7). The key parameters 
(highlighted in red) are recorded during manual processing of the reference time 
point in the GUI. 
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Local back end: 

/pathAo/snakemake/snakemake -j 1 -d /path/to/data/ 

To specify the configuration script for the queuing system: 
--cluster-config ./cluster.json 

For DRMAA back end add: 

--drmaa" -q {cluster.Isf_q}{cluster.lsf_extra}" 

For LSF backend add: 

--cluster "bsub -q {cluster.lsf_q}{cluster.lst_extra}” 

Flag for number of jobs run in parallel: 

-j <n umber of jobs> 

Flag for specifying data location and config file: 

-d /path/to/data/ 

Flag for dry run of snakemake: 

-n 

Force the execution of a rule: 

-R <name of rule> 

To save error and output files of cluster add: 

-drmaa" -q {cluster.lsf q}{cluster.lsf extra} -o test.out -e test.err" 
-cluster "bsub -q {cluster.lsf_q}{cluster.lsf_extra}-o test.out -e test.err" 


Suppl. List 2. List of commands for the snakemake workflow. 
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A Workstation 

Processor 

2x hexa-core Intel Xeon E5-2630 @ 2.30 GHz 

Memory 

128 GB (16x8GB) @1600 MHz, DDR3 ECC RDIMM 

GPU 

2x NVIDIA Quadro 4000, 2 GB of Memory, 256 CUDA cores 

Hard Drive 

4x 4 TB SATA disks in RAID 5 configuration 



B MPI-CBG Cluster "MadmaV 

Processor 

44x Nodes with 2x hexa-core Intel Xeon E5-2640 @ 2.50 GHz 

Total No. of cores 

528 

Memory per node 

128 GB (16x8GB) @ 1333 MHz 

GPU 

4x nodes with NVIDIA Tesla M2090 Fermi Generation @ 

1.3GHz, 6GB GDDR5 Memory @ 1.85GHz, 512 CUDA cores 



Lustre storage server 

Storage 

Fully redundant Lustre volume with 200 TB of usable space 

Storage 

architecture 

2 metadata servers in active/passive configuration with a shared 
disk enclosure and 4 object storage servers with 3 disk 
enclosures delivering 2.5 GB/s each and 10 GB/s aggregated 

Interconnect 

InfiniBand QDR 40Gbps fully non-blocking fat tree topology 


Suppl. Table 1. Hardware used for processing. Parameters of the computer 
hardware used for the automated processing. Standalone desktop machine (A) and 
HPC cluster (B). 
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Job name 

Average memory (MB) 

Average CPU time (sec) 

#jobs 

Detine dataset 

2315 

908 

1 

Detine hdt5 dataset 

2158 

39 

1 

Resave to hdt5 

2827 

530 

90 

Detection and registration 

7189 

1388 

90 

Merge xmi 

3 

43 

1 

Time lapse registration 

2534 

953 

1 

Average tusion 

7761 

3806 

90 

Deconvolution GPU 

27171 

7485 

90 

Detine output 

3 

23 

1 

Detine hdf5 output 

2 

32 

1 

Resave output to hdt5 

4918 

534 

90 


Suppl. Table 2. Cluster processing resource summary. Average resources per 
job were determined on the 175 GB example dataset with 90 time points. 
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